Skip to content

Use " or ' instead of | as quote delimiter #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Jul 7, 2023

Please see the following discussions for context on how we ended up with |quotes|:

Now that we've made it possible to use unquoted values as operands, I realise that my opinion on this topic has changed, and so I'm proposing that we allow both 'quoted' and "quoted" literals. I have three reasons for this:

  1. With the ability to write an expression like {42 :number}, I'm much less bothered by using string-y quote characters for delimiting literal values. In particular, if/once we also allow for negative numbers to not be quoted.
  2. Fundamentally, I believe that this would make our syntax less weird. We've spent quite a bit of effort on ensuring that MF2 syntax is relatively readable, so that people don't feel like they need to read the spec to understand what a message means, or how to write a message. Using | as literal delimiters (or () before the current choice) is really rather novel, and that's not really a good thing.
  3. Allowing either ' or " should reduce the need to escape these delimiters either due to characters in the literal value, or due to the delimiters of a surrounding resource format.

As one negative consequence of this proposed change, this might mean a little more care is necessary when moving the source of a message from one encapsulation to another one. I think this is well justified given the benefits.

@@ -47,12 +49,15 @@ unquoted-start = name-start / DIGIT / "."
reserved = ( reserved-start / private-start ) reserved-body
reserved-start = "!" / "@" / "#" / "%" / "*" / "<" / ">" / "/" / "?" / "~"
private-start = "^" / "&"
reserved-body = *( [s] 1*(reserved-char / reserved-escape / literal))
reserved-body = *( [s] 1*(reserved-char / reserved-escape / quoted))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated fix that I noticed while working on this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a separate PR for this.

The intention was to allow reserved to define whatever structure it wants. For example, allowing "comments" might look something like:

{% This is a "comment". Do not translate it}
{$foo % This is a "comment" about $foo: do not translate it}

The contents of reserved must appear within a placeholder, so the characters { and } must be escaped--but everything else (including quotes, whichever ones we use) remain undefined.

Hence reserved-char should match text-char. The production reserved-char currently omits whitespace, but the optional whitespace part of reserved-body just adds the spaces back.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #415.

@aphillips
Copy link
Member

(as contributor):
I heartily agree with your reasoning about using normal quotation marks. After all this time, |horse| just looks weird to me.

However, I note that our syntax will be embedded into file formats (I cited JSON in the #263 thread) or into code. These syntaxes already use ' or " for quotes, so ours will have to be escaped in those contexts. Indeed, I believe the reason you want to add both quotes is because JS allows such quote alternation--but other syntaxes are more rigorous about one or the other and do not. Messages being ported between runtime environments with different quoting rules will be prone to failures due to over- and underzealous escaping.

(as chair):
This issue was previously resolved in February after long debate. This cannot be merged until we discuss it as a group. You need to write up the pros/cons for consideration.

@aphillips aphillips added syntax Issues related with syntax or ABNF Agenda+ Requested for upcoming teleconference labels Jul 7, 2023
@stasm
Copy link
Collaborator

stasm commented Jul 7, 2023

I think I understand @eemeli’s position, but my current view is to oppose this change.

The premise of the presented argument seems to be that the current delimiter syntax is unusual. That’s true, but more crucially, I’d say our unquoted literal syntax is even more unusual. So we’re already in the realm of “weirdness”, and extending it a bit by choosing | as the delimiter seems reasonable to me given the “why”.

We expect quotes literals to be rare. We tend to give two corollaries from the rareness argument, which contradict each other:

  1. Because it’s rare (we’ve taken care to optimize for the common case), it’s OK if the syntax for the rare case is a bit exotic.

  2. Because it’s rare, people will use it less often, and won’t always remember the correct syntax if the syntax is too exotic.

I think that just looking at these two points of view won’t lead us to a consensus. That’s where trade-offs and use-cases come into play.

Using regular quotes as delimiters comes with a cost: embedding into most other programming languages and container formats will require caution. Using exotic delimiters prevents this one class of issues.

Exotic syntax is OK if it’s well justified. I think users will understand and learn to use it.

@mihnita
Copy link
Collaborator

mihnita commented Jul 10, 2023

I 100% agree with the "weirdness" part.

But I think the original reasons for | was that:

  • we need some kind of quoting, because of possible spaces in the values quoted
  • we didn't like " or ' because the conflict with a lot of existing file formats

And since none of that changed, I would not change this.

TLDR: same concerns as Addison & Stas

@cdaringe
Copy link
Contributor

cdaringe commented Jul 19, 2023

+0. Voicing pure neutrality on this one. This is 🎨 . The OCaml'er and pgpsql'er in me is fine with the pipes. The more common lang developer in me (TS, rust) is fine with the quotes. We're in a DSL. I like to think that consumers of DSLs have a natural intuition of "DSL rules apply here, not my own presumptions", ...but given the correlation/pairing that MF2 has with programming concepts, it's a nice byproduct to have literal grouping aligned between paradigms. " is the unambiguous, ubiquitous char for bounding text content in written language, so I'd lean that way, but not powerfully enough to voice a strong stake.

@eemeli
Copy link
Collaborator Author

eemeli commented Oct 8, 2023

Converting this to draft until #477 is eventually accepted, at which point this can be rebased or abandoned.

@eemeli eemeli marked this pull request as draft October 8, 2023 10:02
@aphillips
Copy link
Member

Closed per telecon of 2023-11-27

@aphillips aphillips closed this Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agenda+ Requested for upcoming teleconference syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants