-
-
Notifications
You must be signed in to change notification settings - Fork 36
Allow colon in name-start, matching XML Name #483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather see us go the other way: have a custom name
, but stick to XML's nmtoken
.
Making names syntax stricter is actually good for interchange: it means that there's fewer risks that other data formats will not be able to represent names easily.
Furthermore, we may need to forbid one certain character from the name-char
production, in order to bless it as the namespace separator (#475).
When it comes to names, the source of truth is on our side (MF), and we care about making it expressible in other formats.
OTOH, I'd like to make our nmtoken
conform exactly to XML's Nmtoken
, because it will be used to represent data defined somewhere else than MF (e.g. LDML).
This, however, would mean giving up on unquoted literals as operands. I filed #478 to document why that's not acceptable (or perhaps to reconsider whether it actually is).
Thanks for thinking about this. I'm kind of sad to just jump into trying to "mend" the namespace. Solving the syntax will help us close out naming, not the other way around I agree that we should either (a) commit to some established naming regime or (b) clearly cast off and define our own (and not pretend to be sorta-kinda something else). Some thoughts:
|
Thanks, this sounds aligned with the process I'd like us to follow here:
This is why I've been holding off the name/nmtoken discussion — we might not even need it if some of our other discussions in flight require to go with something else. I'd be OK hitting pause on this PR, too. |
Right, this is important. Plus, realistically, any LDML troublemaker can still be quoted if needed. OTOH, I feel rather strongly about not differing by a single character from XML's Out of curiosity, I browsed the CLDR to look for any such troublemakers (i.e. LDML values which are XML |
Can you clarify why you feel this way? Is it because I tend to thinking that being "mostly compatible" with some standard, such as Or is it something else? Note that switching to For reference, I think it's helpful to remind ourselves of what is in NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
[#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
[#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
Name ::= NameStartChar (NameChar)*
Nmtoken ::= (NameChar)+ And where we are: name = name-start *name-char
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
/ %x370-37D / %x37F-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFD / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "." / ":"
/ %xB7 / %x300-36F / %x203F-2040 We are closer to |
Thanks for asking. I should have elaborated in my previous comment, but it was already late here. What I meant to say is effectively the same as you did:
The "feel strongly" part was about the fact that today we keep talking about Then, there's also the matter of principles of design. I don't want us to reinvent concepts where well-established alternatives exist, in particular in matters not directly related to i18n. I think Related to the above point is this:
This is a nice side effect of reusing well-established concepts. It's likely not enough on its own to be the reason for sticking to XML's Nmtoken, but it's an example of additional benefits that we can reap if we do. Most importantly, I agree with you that we should first solve other issues currently in flight and then come back here and figure out what we want |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fix this by choosing XML Name's namespace instead of trying to force-fit XML Name.
The namespace for _name_ matches XML's [Name](https://www.w3.org/TR/xml/#NT-Name). | ||
|
||
As `:` is also used as the start sigil of _function_, | ||
using a _name_ with it as a first character is NOT RECOMMENDED. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this. We have namespacing in another PR just now and there's a reasonable solution: instead of XML Name
use XML-Name's NCName
as the basis. The definition of NCName
is exactly "Name
minus the :
character"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should there be such an unnecessarily complex overlap between variable/function/option names and unquoted literals? If the function sigil were just replaced with something that does not appear in XML NameChar
then the names could be exactly described by XML Name
and the unquoted literals by either 1*name-char
(i.e., XML Nmtoken
) or by 1*(name-char / "+" / …)
, in either case forming a strict superset rather than a near-superset to exclude artificially-induced ambiguity w.r.t. colons.
Closing, as this no longer fits with the accepted namespacing design. |
Ping @gibson042, as I apparently can't request a review from you.
While recently looking at the syntax, I realised that there is no place in our syntax where a leading
:
in a name actually conflicts with the function sigil:
. Using it as a first character is probably still a bad idea.@stasm, note that this change would preclude us later allowing for multiple annotations in a single expression.
A
:
as a first character continues to not be allowed in unquoted, where it would indeed conflict with function.