From 5a7198cab7b9f3160d2affe08c31d12a74b926af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Sun, 12 Mar 2023 12:51:54 +0100 Subject: [PATCH] Allow colons in nmtokens The current definition of the `nmtoken` production doesn't match the XML's one. The same is true of `name`, but names are less of an issue. `nmtokens` are frequently used in LDML to define attribute values which are likely to be used a variant keys and option values. --- spec/message.abnf | 8 ++++---- spec/syntax.md | 13 ++++++------- 2 files changed, 10 insertions(+), 11 deletions(-) diff --git a/spec/message.abnf b/spec/message.abnf index 25dfef07df..f16b2b9c28 100644 --- a/spec/message.abnf +++ b/spec/message.abnf @@ -42,15 +42,15 @@ function = ":" name markup-start = "+" name markup-end = "-" name -name = name-start *name-char ; matches XML https://www.w3.org/TR/xml/#NT-Name -nmtoken = 1*name-char ; matches XML https://www.w3.org/TR/xml/#NT-Nmtokens +name = name-start *name-char ; based on https://www.w3.org/TR/xml/#NT-Name, but cannot start with U+003A COLON ":" +nmtoken = 1*name-char ; equal to https://www.w3.org/TR/xml/#NT-Nmtoken name-start = ALPHA / "_" / %xC0-D6 / %xD8-F6 / %xF8-2FF / %x370-37D / %x37F-1FFF / %x200C-200D / %x2070-218F / %x2C00-2FEF / %x3001-D7FF / %xF900-FDCF / %xFDF0-FFFD / %x10000-EFFFF -name-char = name-start / DIGIT / "-" / "." / %xB7 - / %x0300-036F / %x203F-2040 +name-char = name-start / DIGIT / "-" / "." / ":" + / %xB7 / %x0300-036F / %x203F-2040 text-escape = backslash ( backslash / "{" / "}" ) literal-escape = backslash ( backslash / "|" ) diff --git a/spec/syntax.md b/spec/syntax.md index 0dd32e5357..f6c30cff81 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -464,10 +464,9 @@ Otherwise, the set of characters allowed in names is large. The _nmtoken_ token doesn't have _name_'s restriction on the first character and is used as variant keys and option values. -_Note:_ The Name and Nmtoken symbols are intentionally defined to be -the same as XML's [Name](https://www.w3.org/TR/xml/#NT-Name) and [Nmtoken](https://www.w3.org/TR/xml/#NT-Nmtokens) +_Note:_ _nmtoken_ is intentionally defined to be the same as XML's [Nmtoken](https://www.w3.org/TR/xml/#NT-Nmtoken) in order to increase the interoperability with data defined in XML. -In particular, the grammatical feature data [specified in LDML](https://unicode.org/reports/tr35/tr35-general.html#Grammatical_Features) +In particular, the grammatical data [specified in LDML](https://unicode.org/reports/tr35/tr35-general.html#Grammatical_Features) and [defined in CLDR](https://unicode-org.github.io/cldr-staging/charts/latest/grammar/index.html) uses Nmtokens. @@ -479,15 +478,15 @@ markup-end = "-" name ``` ```abnf -name = name-start *name-char ; matches XML https://www.w3.org/TR/xml/#NT-Name -nmtoken = 1*name-char ; matches XML https://www.w3.org/TR/xml/#NT-Nmtokens +name = name-start *name-char ; based on https://www.w3.org/TR/xml/#NT-Name, but cannot start with U+003A COLON ":" +nmtoken = 1*name-char ; equal to https://www.w3.org/TR/xml/#NT-Nmtoken name-start = ALPHA / "_" / %xC0-D6 / %xD8-F6 / %xF8-2FF / %x370-37D / %x37F-1FFF / %x200C-200D / %x2070-218F / %x2C00-2FEF / %x3001-D7FF / %xF900-FDCF / %xFDF0-FFFD / %x10000-EFFFF -name-char = name-start / DIGIT / "-" / "." / %xB7 - / %x0300-036F / %x203F-2040 +name-char = name-start / DIGIT / "-" / "." / ":" + / %xB7 / %x0300-036F / %x203F-2040 ``` ### Escape Sequences