Skip to content

Use (parentheses) rather than "quotes" around literal values #276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 7, 2022

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented May 20, 2022

Closes #263 by replacing "quotes" with (parentheses) as delimiters for literal values. Correspondingly, the String token is renamed as Literal, though it's still expected to always be parsed into the data model as a string value.

Within a Literal, both ( and ) require escaping. Strictly this is only required for the closing parenthesis, but not also escaping the start looks odd, and should improve the experience for anyone looking at MF2 source in an editor that uses generic parenthesis-matching highlighting.

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this change, but remain uneasy about using such a common character pair. I buy @stasm's arguments about using separate open/close characters, otherwise I'd suggest using backtick (U+0060 ). Or maybe use <and>`? Those aren't as common in "normal" text (although they are not HTML friendly)

@eemeli
Copy link
Collaborator Author

eemeli commented May 20, 2022

One challenge with backticks ` is that they're commonly used to indicate template or "code" strings. For instance in many markdown variants they require doubling the surrounding backticks to be escaped; I needed to use `` ` `` to get one to render right. They're also often visually difficult to distinguish from single quotes, so may make `mistakes' difficult to spot.

Angle brackets < > would need to be escaped in XML & HTML environments, which are rather common.

Hence parentheses ( ) as something like a least-worst option.

@stasm
Copy link
Collaborator

stasm commented May 20, 2022

FWIW, in #263 I was trying to say that for literals we might not need separate open/close delimiters, like I'm advocating for in case of patterns. I also have a concern about angle brackets which I shared in #263 (I'm trying to separate the discussion about the design from the discussion about the implementation).

@mihnita
Copy link
Collaborator

mihnita commented May 20, 2022

I don't think that round brackets are not much better than "

It means that they become "something to escape", because they can be encountered in plain text.
(I don't know if more often than " or not)

TLDR: same as Addison, "remain uneasy about using such a common character pair."


I don't think that < ... > works either, as it conflicts with many existing "storage formats"

Java can use .xml for resources, and some prefer them to .properties because they are Unicode.
.NET uses .xml files for localization (resx files)
The Apple stringdict file format used for localization is also xml.
Android uses xml for translation.

Even with HTML-like templating. I know of several localization systems for html that "embed" the localizable strings in html:
<p i18n=true>Hello {$user}</p>

@stasm
Copy link
Collaborator

stasm commented May 20, 2022

It means that they become "something to escape", because they can be encountered in plain text.

There are two "directions" of escaping that we should be considering.

  • Inside: what if I want to use the literal's delimiter as part of the literal value?
  • Outside: does the literal's delimiter collide with the container's delimiter for strings, or other special characters in the container?

Choosing something else that the single quote, double quote, backtick or angle brackets optimizes for the "outside".

You're right that there may be cases when an "inside" conflict happens. The PR actually makes one of them clear in one of the examples:

{(Thu Jan 01 1970 14:37:00 GMT+0100 \(CET\)): datetime weekday=long}

Do you think a pipe would be a better choice here? Or any other character?

{|Thu Jan 01 1970 14:37:00 GMT+0100 (CET)|: datetime weekday=long}

@stasm
Copy link
Collaborator

stasm commented May 20, 2022

FWIW, I think I'm OK with {(Thu Jan 01 1970 14:37:00 GMT+0100 \(CET\)): datetime weekday=long} as a trade-off.

Co-authored-by: Addison Phillips <addison@lab126.com>
@eemeli eemeli force-pushed the parenthetically branch from 886ba8a to f7a6193 Compare May 25, 2022 09:08
@eemeli
Copy link
Collaborator Author

eemeli commented May 25, 2022

Rebased to account for changes in other merged PRs.

@mihnita Did you have concerns about preferring "..." over (...) for literals?

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Co-authored-by: Markus Scherer <markus.icu@gmail.com>
@romulocintra
Copy link
Collaborator

@eemeli needs rebase but ok to merge

@eemeli eemeli merged commit 19d581b into unicode-org:develop Jun 7, 2022
@eemeli eemeli deleted the parenthetically branch June 7, 2022 00:22
echeran pushed a commit that referenced this pull request Sep 20, 2022
Co-authored-by: Addison Phillips <addison@lab126.com>
Co-authored-by: Markus Scherer <markus.icu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants