Skip to content

Implement code more introducer #529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 1, 2023
Merged

Implement code more introducer #529

merged 12 commits into from
Dec 1, 2023

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Nov 21, 2023

This starts implementing the changes for the code mode introducer (#521, #526), including dropping when.

Before I get any further, it might be best for us to align on the exact ABNF changes. See line comments for a few specific questions.

@eemeli eemeli requested review from aphillips and stasm November 21, 2023 10:23
@aphillips
Copy link
Member

I was hoping to land the namespace changes before embarking on this, which is why I didn't do it yesterday. But worth resolving stuff before we change the ABNF for reals.

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the ABNF hard to understand.

Future keywords will be lowercase ASCII, so require an unquoted pattern starting with . to be escaped or quoted.

@eemeli
Copy link
Collaborator Author

eemeli commented Nov 21, 2023

Based on feedback from @aphillips and @gibson042, I changed it so a leading . in a message always needs to be escaped, lest it be introducing a complex-message keyword.

Copy link
Collaborator

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the grammar is stable, don't forget to update syntax.md as well.

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making progress. Some of my suggestions take a different route to the same destination for reasons I call out. Pay attention to the whitespace issue.


complex-message = "{{" [s] *(declaration [s]) body [s] "}}"
simple-message = [simple-start pattern]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the [/] is super-subtle and may be worth a comment. These are there to make the empty string a valid message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's make sure this is called out in syntax.md.

@eemeli
Copy link
Collaborator Author

eemeli commented Nov 22, 2023

Updated again. I dropped text, as with different handling for the first character there's now no way for a single ABNF rule to correspond with what we'd like to consider text in the syntax. So including such a rule would be misleading.

I'd also strongly prefer not to add comments to the message.abnf file, as we already have its annotated version in syntax.md. OTOH, I added one for reserved-keyword, as its ABNF rule is only an approximation of what it should be (see #529 (review) above), and it will cause automated parser generators to fail with this syntax.

; Note that the following expression is a simplification,
; as this rule MUST NOT be considered to match existing keywords:
; (input / local / match) (s / "{")
reserved-statement = "." reserved-body 1*expression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think requiring reserved-statement to end with at least one expression matches too little.

Suggested change
reserved-statement = "." reserved-body 1*expression
; A reserved statement continues until the next statement or quoted-pattern,
; and may include any number of expressions and/or quoted literals
reserved-statement = "." *([s] reserved-statement-part)
reserved-statement-part = 1*(reserved-statement-char / reserved-statement-escape)
/ expression
/ quoted
reserved-statement-char = %x00-08 ; omit `s`: HTAB and LF
/ %x0B-0C ; omit `s`: CR
/ %x0E-19 ; omit `s`: SP
/ %x21-2D ; omit .
/ %x2F-5B ; omit \
/ %x5D-7A ; omit { | }
/ %x7E-D7FF ; omit surrogates
/ %xE000-10FFFF
reserved-statement-escape = backslash ( %x00-D7FF / %xE000-10FFFF )

Note that this locks in the expression syntax for effectively forever because a left brace could start a {{…}} quoted expression which would terminate the reserved statement—though possible to work around that too; it's rather impractical. And likewise for quoted.

Copy link
Collaborator Author

@eemeli eemeli Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The . exclusion also disallows using name within a future statement, so e.g. introducing local-declaration would not be possible if we didn't include it already.

On the other hand, if each declaration had to end in a required whitespace s, we would only need to bar the sequence (s ".") from what's reserved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibson042 Please correct me if I misremember, but my understanding from today is that we're fine with reserving only statements that end in an expression?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibson042 Please correct me if I misremember, but my understanding from today is that we're fine with reserving only statements that end in an expression?

That is not my recollection, and would preclude possibilities like .strict true. However, I'm fine with addressing it in a followup.

@eemeli
Copy link
Collaborator Author

eemeli commented Nov 27, 2023

@aphillips I'd like to discuss the open threads here on today's call, if possible:

  1. Confirm that we don't want to require any extra whitespace. (thread)
  2. Do we allow \. generally in text, or only in very specific places, or not at all? (thread)
  3. Should we include a text rule, even if text will mean something different? (thread)
  4. Is it enough to reserve statements that end with an expression? (thread)

@eemeli eemeli added the Agenda+ Requested for upcoming teleconference label Nov 27, 2023
@eemeli eemeli marked this pull request as ready for review November 27, 2023 22:05
@eemeli eemeli requested a review from aphillips November 27, 2023 22:06
@eemeli
Copy link
Collaborator Author

eemeli commented Nov 27, 2023

I've now resolved the discussions as agreed, and dropped the \. escapes (see links in previous comment). Please review?

During the call, we planned to merge this provided that it receives some approvals, and for a separate PR to apply the further spec changes.

@eemeli eemeli added syntax Issues related with syntax or ABNF specification Issue affects the specification and removed Agenda+ Requested for upcoming teleconference labels Nov 27, 2023
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

૮꒰ ˶• ༝ •˶꒱ა

Sheep it.

Copy link
Collaborator

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR still has a few issues that IMO should be fixed before it lands.

Comment on lines +12 to +13
complex-body = quoted-pattern
/ ((selectors / reserved-statement) 1*([s] variant))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may prove valuable to establish a greater separation between selectors and reserved-statement here.

Suggested change
complex-body = quoted-pattern
/ ((selectors / reserved-statement) 1*([s] variant))
complex-body = quoted-pattern
/ (selectors 1*([s] variant))
/ (reserved-statement 1*([s] variant))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say why? The two expressions are wholly synonymous, and I don't really see why we should prefer one over the other.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I expect the latter to diverge—I don't think we should treat .match {…}… and .<reserved> … as interchangeable prefixes before a list of variants, but rather .match {…}… <variant>… as one thing complete in itself and .<reserved> … as something independent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motiviation for putting reserved-statement inside complex-body? Do we expect other multivariant constructs than match? I think this may be building too much flexibility into the spec. Could we instead agree that any future keywords would go with other declarations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I now see that @gibson042 mentions this as one of the issues to work on as follow-ups in his review. How about we land this PR without reserved-statement in complex-body and then consider adding it, rather than landing it with doubts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I would find it easier to include it, show what impact it has across the stack, and then evaluate whether to drop it.

We have already agreed on having two follow up PRs to this that will both re-tread this ground. Could we agree that this will be considered more in both of those, and accept having it in the ABNF for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I would find it easier to include it, show what impact it has across the stack, and then evaluate whether to drop it.

We should do things exactly the other way.

I also don't appreciate pushing hard for merging. We should merge when things look good to merge.

; Note that the following expression is a simplification,
; as this rule MUST NOT be considered to match existing keywords:
; (input / local / match) (s / "{")
reserved-statement = "." reserved-body 1*expression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibson042 Please correct me if I misremember, but my understanding from today is that we're fine with reserving only statements that end in an expression?

That is not my recollection, and would preclude possibilities like .strict true. However, I'm fine with addressing it in a followup.

private-use = private-start reserved-body
private-start = "^" / "&"
; Reserve additional .keywords for use by future versions of this specification.
reserved-statement = reserved-keyword [s [reserved-body [s]]] 1*expression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails to allow whitespace between the terminal expressions, and more importantly suffers from the reserved-body problem mentioned below.

Suggested change
reserved-statement = reserved-keyword [s [reserved-body [s]]] 1*expression
; A reserved statement is a reserved keyword followed by an
; optional sequence of space-separated parts and
; a final sequence of one or more expressions.
reserved-statement = reserved-keyword [*(s reserved-part) s reserved-statement-last] 1*([s] expression)
reserved-part = 1*(reserved-char / reserved-escape / expression) [quoted]
/ quoted
reserved-statement-last = *(reserved-char / reserved-escape / expression) (reserved-char / reserved-escape)
/ 1*(reserved-char / reserved-escape / expression) quoted
/ quoted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular benefit to parsing out the spaces within reserved content? Originally in #374 (comment) we took them out of reserved-char to effectively trim the reserved expression contents within the ABNF.

Should we perhaps revisit that decision, and simplify some of this by not splitting the reserved text into parts?

In the meantime, good point about the missing [s] between expressions. If we leave the reserved cleanup to later, this should work for the current line:

Suggested change
reserved-statement = reserved-keyword [s [reserved-body [s]]] 1*expression
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular benefit to parsing out the spaces within reserved content? Originally in #374 (comment) we took them out of reserved-char to effectively trim the reserved expression contents within the ABNF.

I think there is; it allows us to more accurately capture adjacency, especially when it comes to things like consecutive declarations (ok but should remain separate) and directly adjacent quoted literals (should be rejected IMO) and—as you mentioned in another comment—dots, which should be accepted as part of a name but not when they appear to start a new statement.


reserved-body = *([s] 1*(reserved-char / reserved-escape / quoted))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be separate productions for reserved statement bodies and reserved annotation bodies, so the body of reserved/private-use annotations better aligns with that of function annotations.

Suggested change
reserved-body = *([s] 1*(reserved-char / reserved-escape / quoted))
; Each space-separated part is an arbitrary mixture of text and expressions,
; optionally followed by a quoted literal (such that no quoted literal may be
; directly followed by text or an expression, prohibiting e.g. `|a|b` and `|a||b|`).
reserved-annotation-body = *(s reserved-part)
reserved-part = 1*(reserved-char / reserved-escape / expression) [quoted]
/ quoted

Comment on lines 71 to 73
reserved-char = %x00-08 ; omit HTAB and LF
/ %x0B-0C ; omit CR
/ %x0E-19 ; omit SP
Copy link
Collaborator

@gibson042 gibson042 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserved-char should omit . so e.g. .new {$a} .hotness {$b} is recognized as two distinct reserved statements and .prefix {$foo} .local $var = {|val|} is recognized as a reserved statement followed by a local declaration, while still allowing reserved statements to include non-consecutive expressions (like local-declaration = local s variable [s] "=" [s] expression already does).

Suggested change
reserved-char = %x00-08 ; omit HTAB and LF
/ %x0B-0C ; omit CR
/ %x0E-19 ; omit SP
reserved-char = %x00-08 ; omit HTAB and LF
/ %x0B-0C ; omit CR
/ %x0E-19 ; omit SP
/ %x21-2D ; omit .
/ %x2F-5B ; omit \

And accordingly, reserved-escape should be generalized to e.g. reserved-statement-escape = backslash ( %x00-D7FF / %xE000-10FFFF ).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this precludes syntax like that used now in local-declaration, where we have a variable name at the top level, and name can include a .. Is that intentional?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it will need further tweaking to match . only after name-start or name-char, as inside name.

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
private-use = private-start reserved-body
private-start = "^" / "&"
; Reserve additional .keywords for use by future versions of this specification.
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to @gibson042 from #529 (comment):

Please correct me if I misremember, but my understanding from today is that we're fine with reserving only statements that end in an expression?

That is not my recollection, and would preclude possibilities like .strict true. However, I'm fine with addressing it in a followup.

Could we merge this PR with this note in the ABNF, in expectation of getting a PR from @gibson042 that could update all reserved content holistically?

Suggested change
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
; NOTE: Rules for reserved statements and annotations
; should be considered provisional, pending a near-future PR updating them.
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)

ps. I had not considered statements like .strict true. That does sound like something we ought to reserve space for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason statements end with expression is that the closing } of the expression is how we know the statement ended. Otherwise we're in reserved-body... forever. It is a hidden detail of our syntax that statements end with a close bracket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While that's true with the current state of this PR, I believe @gibson042 would like us to drop . as a valid character in reserved-body, which would allow determining its end by encountering it at the start of the next keyword.

My request here is for us to find a way to not block this PR on that discussion, and to have it instead separately so that the syntax of both reserved statements and annotations could be considered in one package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with finalizing this PR. I don't think the reserved-statement discussion (were we to change something) needs to be in the way of that. It should be a separate conversation. If we want to pursue a change, it should be at least an issue if not a design document.

That said, I would not include the proposed note in this comment thread. Any changes we consider or make will be done in the near future. If we include text about it, put it in syntax.md.

Copy link
Collaborator

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have issues with syntactic reservations in this PR:

But that said, it is coherent and can be used as a foundation for further iteration.

Comment on lines +12 to +13
complex-body = quoted-pattern
/ ((selectors / reserved-statement) 1*([s] variant))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motiviation for putting reserved-statement inside complex-body? Do we expect other multivariant constructs than match? I think this may be building too much flexibility into the spec. Could we instead agree that any future keywords would go with other declarations?

/ %x7C ; omit }
/ %x7E-D7FF ; omit surrogates
/ %xE000-10FFFF
simple-start-char = %x0-2D ; omit .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it looks like we don't append -char to other -start productions in our grammar.

Suggested change
simple-start-char = %x0-2D ; omit .
simple-start = %x0-2D ; omit .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there already is another rule that's called simple-start. If necessary, could we take care of any such editorial rule renames in a separate follow-up PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I'll open a new PR for this.

@aphillips
Copy link
Member

@stasm asked:

What's the motiviation for putting reserved-statement inside complex-body? Do we expect other multivariant constructs than match? I think this may be building too much flexibility into the spec. Could we instead agree that any future keywords would go with other declarations?

I think it was me who asked for reserved-statement to potentially be also a replacement for selectors. This would allow for a different type of selector in future syntax.

Perhaps this should be:

complex-body = quoted-pattern
             / (selectors 1*([s] variant))
             / reserved-statement ; which matches some-newly-unreserved-statement 1*([s] variant) 
                                  ; for some future statement

@stasm
Copy link
Collaborator

stasm commented Dec 1, 2023

I discussed with @eemeli in Slack and we agreed to merge this as-is, and iterate further starting from @gibson042's feedback in his reivew above: #529 (review).

@aphillips
Copy link
Member

Merging this to allow further progress. Outstanding discussions on reserved-statements and other discussions will be tracked in separate issues and/or PRs.

@aphillips aphillips merged commit 3f35bee into main Dec 1, 2023
@eemeli eemeli deleted the dot-keywords branch December 1, 2023 19:58
aphillips added a commit that referenced this pull request Dec 4, 2023
#548)

* Update the rest of the spec to match the ABNF after #529

* Apply suggestions from code review

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Remove reserved-statement from matcher

* Update spec/syntax.md

---------

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Co-authored-by: Addison Phillips <addison@unicode.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
specification Issue affects the specification syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reserve the namespace for future keywords
5 participants