-
-
Notifications
You must be signed in to change notification settings - Fork 36
Add negative-start rule #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
From @gibson042:
UAX 31 identifiers support a much more limited set of strings than we want to support as unquoted values; for example values such as One key differentiator here is that for From @peter-b:
Please correct me if I'm mistaken, but this seems like a point that's more about For |
"Needed" seems like an overly strong characterization. What problems would ensue from updating the syntax to match XML Nmtoken exactly (keeping in mind that it is possible for source to be syntactically valid while still being rejected for higher-level semantic reasons)? And if the deviation is actually necessary, shouldn't the reason be documented in a grammar comment, either directly or with a referencing link? |
XML Nmtoken may start with either the
This clarification is provided in the To make _unquoted_ literals distinct from _function_ names,
a literal MUST be quoted if it begins with a `:`
or if it begins with a `-` that is **not** followed by a `.` or a digit. At least my understanding is that our syntax is defined by both the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addresses my previous comments, but this PR's discussion thread makes me think we should make some changes to the text to clarify what we're doing.
spec/syntax.md
Outdated
@@ -462,9 +462,10 @@ except for surrogate code points U+D800 through U+DFFF. | |||
The characters `\` and `|` MUST be escaped as `\\` and `\|`. | |||
|
|||
**_Unquoted_** literals have a much more restricted range that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing some information that could help avoid repeats of the conversation in this PR.
**_Unquoted_** literals have a much more restricted range that | |
A _literal_ MAY appear **_unquoted_** when its content matches | |
the `unquoted` production. | |
Similar to other productions in the message syntax, | |
the content of an _unquoted_ literal is identical to XML's [Nmtoken] | |
(https://www.w3.org/TR/xml/#NT-Nmtoken) except that specific characters | |
meaningful to other portions of the syntax are not allowed at the start. Specifically: | |
* A _literal_ MUST be quoted if it begins with a `:` (U+003A COLON). | |
* An _unquoted_ literal MUST NOT begin with a `-` (U+002D HYPHEN MINUS) | |
unless that character is followed by a `.` (U+002E FULL STOP) or an ASCII digit. | |
This allows literals which might be used to represent negative numbers, such | |
as `-42` or `-.345`, to appear unquoted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with being more verbose about the description here, but the above suggestion is a bit problematic:
- The first paragraph is somewhat tautological, given that this is how practically all of the syntax is defined.
- I don't think we have any other productions that refer to or approximate XML Nmtoken. Our name is rather close to XML Name, but its first-character rules are quire different.
Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
If XML Nmtoken is sufficiently important to use as a foundation for unquoted literals, then having overlap between it and function prefix sigils seems like an unnecessary misstep. Without dragging this too far into the weeds, I will note that most ASCII punctuation characters are excluded from NameChar and therefore available for that use in MF2, particularly all of those that are currently included in |
@gibson042 That's a good point. If we want to claim that our naming is rooted in XML, why not just go all the way and use |
The important issue is that our syntax be even as natural and non-error prone as possible. I don't see that there is much if any value to adhering strictly to Nmtoken if that adherence compromises the syntax, as long as we have an unambiguous definition. |
Sure, but are you claiming that adopting Nmtoken would compromise the syntax? Is |
@macchiati I agree. However, if we can achieve full compatibility with our chosen namespace scheme by choosing a different (somewhat randomly chosen) sigil, doesn't that (slightly) reduce potential errors? The syntax is only required to be internally consistent, but its better if it is also widely compatible with the contexts in which it will appear and if exterior software requires only minimal modification to interact with our constructs. Of course, it isn't as if the world is constructed around |
I continue to believe that this PR treats the symptom rather than the root cause. I think there are two blockers before we can consider this PR for merging:
|
Just to note, on the call of 4 September we agreed to merge this on or immediately after the end of the Seville colloquium on 13 September, unless we explicitly decide against doing so. No such decision has been made. I agree that this is treating the symptom rather than the root cause, but that something very much like this is required for negative number support with our current syntax. We should patch this now, while acknowledging that it may be wholly superseded by later changes. |
That agreement was based on the assumption that we would have completed the design of open/close in Seville. That didn't happen and the design is still ongoing in #470. I want to this PR to stay open because I want this to be a friction point, so that we don't get complacent with the current proposed design on open/close spannables. An alternative which I'd be happy to go with instead is to remove the current open/close features from the spec on the |
We have a significant refactor of the ABNF that's going to be needed due to all of the changes we agreed to in Seville and later. We do need to incorporate negative number literals (which we have consensus on) at some point. I think your point, @stasm, is that this particular formulation is only needed if we use the sigil I don't think this PR is necessarily the right friction point for addressing the sigil use. My suggestion would be (a) merge this (knowing that it's potentially throw-away); (b) figure out the more-radical syntax changes (specifically text-mode and namespacing) and implement those in the ABNF; (c) then address any changes for spannable (which might change the sigils). Isn't #470 sufficient as a friction point for the last? |
Held for discussion in 2023-11-13 call |
I believe this is now obsolete and should be closed in favor of #553 |
Yeah, this is out of date. |
This is yet another potential solution for allowing unquoted negative numbers; see also #397 and #398.
This change makes a specific exception for
-
to start an unquoted, provided that it is followed by a.
or a digit. As these characters are not included inname-start
, they disambiguate the parsing of e.g.-1
or-.5
when used as an operand.