Skip to content

Add explicit whitespace definitions #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 28, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 21 additions & 26 deletions spec/message.ebnf
Original file line number Diff line number Diff line change
@@ -1,25 +1,20 @@
Message ::= Declaration* ( Pattern | Selector Variant+ )
Message ::= (s? Declaration)* s? Body s?

/* Preamble */
Declaration ::= 'let' WhiteSpace Variable '=' '{' Expression '}'
Selector ::= 'match' ( '{' Expression '}' )+
Declaration ::= 'let' s Variable s? '=' s? '{' s? Expression s? '}'
Body ::= Pattern | Selector (s? Variant)+

/* Variants and Patterns */
Variant ::= 'when' ( WhiteSpace VariantKey )+ Pattern
VariantKey ::= Literal | Nmtoken | '*'
Pattern ::= '{' (Text | Placeholder)* '}' /* ws: explicit */
Pattern ::= '{' (Text | Placeholder)* '}'
Selector ::= 'match' (s? '{' s? Expression s? '}')+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the rule on this line, another point to consider is the name of the rule. It's the singular word "Selector" when it includes 1 or more things that we have been casually referring to as selectors. So I think it would be clearer to use the plural word "Selectors" and separate out a rule for the syntax of that individual concept named using the singular word "Selector":

Suggested change
Selector ::= 'match' (s? '{' s? Expression s? '}')+
Selectors ::= 'match' (s Selector)+
Selector ::= '{' s? Expression s? '}'

If doing so, then rename the reference to this rule within the Body rule accordingly.

(This suggested change also incorporates @eemeli's suggested change in a parallel comment that LGTM.)

Copy link
Collaborator Author

@stasm stasm Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @echeran. I tried to limit the changes in this PR to only those related to whitespace. That's why in the description I wrote: Please review this PR without paying too much attention to naming. :)

The exact change you're suggesting is implemented in #347. Glad to see that we agree :)

Variant ::= 'when' (s Key)+ s? Pattern
Key ::= Nmtoken | Literal | '*'

/* Placeholders */
Placeholder ::= '{' (Expression | Markup | MarkupEnd) '}'
Placeholder ::= '{' s? (Expression | Markup | MarkupEnd) s? '}'

/* Expressions */
Expression ::= Operand Annotation? | Annotation
Operand ::= Literal | Variable
Annotation ::= Function Option*
Option ::= Name '=' (Literal | Nmtoken | Variable)
Expression ::= (Literal | Variable) (s Annotation)? | Annotation
Annotation ::= Function (s Option)*
Option ::= Name s? '=' s? (Literal | Nmtoken | Variable)

/* Markup Tags */
Markup ::= MarkupStart Option*
Markup ::= MarkupStart (s Option)*

<?TOKENS?>

Expand All @@ -29,12 +24,12 @@ TextChar ::= AnyChar - ('{' | '}' | Esc)
AnyChar ::= [#x0-#x10FFFF] - [#xD800-#xDFFF]

/* Names */
Variable ::= '$' Name /* ws: explicit */
Function ::= ':' Name /* ws: explicit */
MarkupStart ::= '+' Name /* ws: explicit */
MarkupEnd ::= '-' Name /* ws: explicit */
Name ::= NameStart NameChar* /* ws: explicit */
Nmtoken ::= NameChar+ /* ws: explicit */
Variable ::= '$' Name
Function ::= ':' Name
MarkupStart ::= '+' Name
MarkupEnd ::= '-' Name
Name ::= NameStart NameChar*
Nmtoken ::= NameChar+
NameStart ::= [a-zA-Z] | "_"
| [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF]
| [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D]
Expand All @@ -44,13 +39,13 @@ NameChar ::= NameStart | [0-9] | "-" | "." | #xB7
| [#x0300-#x036F] | [#x203F-#x2040]

/* Literals */
Literal ::= '(' (LiteralChar | LiteralEscape)* ')' /* ws: explicit */
Literal ::= '(' (LiteralChar | LiteralEscape)* ')'
LiteralChar ::= AnyChar - ('(' | ')' | Esc)

/* Escape sequences */
Esc ::= '\'
TextEscape ::= Esc Esc | Esc '{' | Esc '}'
LiteralEscape ::= Esc Esc | Esc '(' | Esc ')'

/* WhiteSpace */
WhiteSpace ::= #x9 | #xD | #xA | #x20 /* ws: definition */
/* White space */
s ::= (#x9 | #xD | #xA | #x20)+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use LWSP, which is built-in, instead of defining our own?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the definition in https://datatracker.ietf.org/doc/html/rfc5234#appendix-B, LWSP excludes standalone LF, which I don't think we want to do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the same appendix says that it contains some basic rules that are in common use and that [b]asic rules are in uppercase. Does common use mean that they're built-in, or perhaps that they're expected to be defined in a certain way?

The reason I'm asking is that I found an ABNF parser at https://author-tools.ietf.org/abnf and it complained about DIGIT and ALPHA being undefined. Bug or intended behavior?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer keeping an explicit whitespace definition within the spec, rather than introducing an external dependency.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the same appendix says that it contains some basic rules that are in common use and that [b]asic rules are in uppercase. Does common use mean that they're built-in, or perhaps that they're expected to be defined in a certain way?

I think in an RFC, you would usually say that the following rules are imported from RFC 5234, and give a list of the rules you need.

The reason I'm asking is that I found an ABNF parser at https://author-tools.ietf.org/abnf and it complained about DIGIT and ALPHA being undefined. Bug or intended behavior?

DIGIT and ALPHA are too frequent for such a bug not having been detected. I'd guess you have to add the rules you reference from RFC 5234 by hand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bill's parser (which powers author-tools) has always complained about the built-ins. I think most I-D authors ignore the warnings (I know I learned to). For example, if you run BCP47's ABNF in the tool, it complains about all of the built-ins (and BCP47 is a published set of RFCs blessed by the editor):

Language-Tag = langtag / privateuse / grandfathered
langtag = language [ "-" script ] [ "-" region ] *( "-" variant ) *( "-" extension ) [ "-" privateuse ]
; ALPHA UNDEFINED
language = ( 2*3ALPHA [ "-" extlang ] ) / 4ALPHA / 5*8ALPHA
extlang = 3ALPHA *2( "-" 3ALPHA )
script = 4ALPHA
; DIGIT UNDEFINED
region = 2ALPHA / 3DIGIT
variant = 5*8alphanum / ( DIGIT 3alphanum )
extension = singleton 1*( "-" ( 2*8alphanum ) )
singleton = DIGIT / %x41-57 / %x59-5A / %x61-77 / %x79-7A
privateuse = "x" 1*( "-" ( 1*8alphanum ) )
grandfathered = irregular / regular
irregular = "en-GB-oed" / "i-ami" / "i-bnn" / "i-default" / "i-enochian" / "i-hak" / "i-klingon" / "i-lux" / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" / "i-tay" / "i-tsu" / "sgn-BE-FR" / "sgn-BE-NL" / "sgn-CH-DE"
regular = "art-lojban" / "cel-gaulish" / "no-bok" / "no-nyn" / "zh-guoyu" / "zh-hakka" / "zh-min" / "zh-min-nan" / "zh-xiang"
alphanum = ( ALPHA / DIGIT )
obs-language-tag = primary-subtag *( "-" subtag )
primary-subtag = 1*8ALPHA
subtag = 1*8( ALPHA / DIGIT )
; CRLF UNDEFINED
registry = record *( "%%" CRLF record )
record = 1*field
field = ( field-name field-sep field-body CRLF )
field-name = ( ALPHA / DIGIT ) [ *( ALPHA / DIGIT / "-" ) ( ALPHA / DIGIT ) ]
; SP UNDEFINED
field-sep = *SP ":" *SP

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the definition in https://datatracker.ietf.org/doc/html/rfc5234#appendix-B, LWSP excludes standalone LF, which I don't think we want to do?

I'm curious about the LF and its exclusion from LWSP. In addition to +1 on @eemeli's comment to allow for whitespace between Declarations (the let ...), I think it would be natural for people who want to serialize messages to a file to want to put a newline in between them. Same applies to the comment for whitespace between each variant in a selection message. At least, by "natural", I mean that I think this is how we have already written all of our example messages in our documentation so far.

What are the reasons for LWSP not including LF (what use cases does it support, and do those apply here)?

However we decide this, I think our examples in our documentation should match our syntax, and we just have to update one or both to ensure that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to +1 on @eemeli's comment to allow for whitespace between Declarations (the let ...) [...]

Did you mean to say "to allow" or "to require"? @eemeli's comment uses a stronger language.

Same applies to the comment for whitespace between each variant in a selection message. At least, by "natural", I mean that I think this is how we have already written all of our example messages in our documentation so far.

I agree about the "natural". I think we should recommend readable formatting and lead by example. This PR is more about defining the bounds of what's considered valid, even if it's formatted horribly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stasm Ah, yep, I meant "require".