Skip to content

First draft of some registry functions #420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 24, 2023
8 changes: 4 additions & 4 deletions spec/registry.dtd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!ELEMENT registry (function*|pattern*)>
<!ELEMENT registry (function,pattern)*>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this. I filed #434 to discuss and fix this independently of this PR, so that you can focus it on registry.xml.


<!ELEMENT function (description|(formatSignature|matchSignature)+)>
<!ELEMENT function (description,(formatSignature|matchSignature)+)>
<!ATTLIST function name NMTOKEN #REQUIRED>

<!ELEMENT description (#PCDATA)>
Expand All @@ -9,11 +9,11 @@
<!ATTLIST pattern id ID #REQUIRED>
<!ATTLIST pattern regex CDATA #REQUIRED>

<!ELEMENT formatSignature (input?|option*)>
<!ELEMENT formatSignature (input?,option*)>
<!ATTLIST formatSignature position (open|close|standalone) "standalone">
<!ATTLIST formatSignature locales NMTOKENS #IMPLIED>

<!ELEMENT matchSignature (input?|option*|match*)>
<!ELEMENT matchSignature (input?,option*,match*)>
<!ATTLIST matchSignature locales NMTOKENS #IMPLIED>

<!ELEMENT input EMPTY>
Expand Down
5 changes: 3 additions & 2 deletions spec/registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ For the sake of brevity, only `locales="en"` is considered.
<function name="number">
<description>
Format a number.
Match a numerical value against CLDR plural categories or against a number literal.
Match a **formatted** numerical value against CLDR plural categories or against a number literal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the imply that the input is a localized number that'd be parsed and then matched? Also since this function includes both selection and formatting, should this also say something about the formatting part?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I've merged a suggestion from Addison which does not mention formatting.

There is an ongoing discussion in issue #425
Especially this comment #425 (comment)

</description>

<matchSignature locales="en">
Expand All @@ -101,7 +101,8 @@ For the sake of brevity, only `locales="en"` is considered.
<option name="maximumFractionDigits" pattern="positiveInteger"/>
<option name="minimumSignificantDigits" pattern="positiveInteger"/>
<option name="maximumSignificantDigits" pattern="positiveInteger"/>
<match values="one other"/>
<!-- Since this applies to both cardinal and ordinal, all plural options are valid. -->
<match values="zero one two few many"/>
<match pattern="anyNumber"/>
</matchSignature>

Expand Down
147 changes: 147 additions & 0 deletions spec/registry.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="registry.dtd" type="application/xml-dtd"?>
<registry>
<!-- All regex here are to be seen as provisory. See issue #422. -->
<pattern id="anyNumber" regex="-?(0|([1-9]\d*))(\.\d*)?([eE][-+]?\d+)?"/>
<pattern id="positiveInteger" regex="0|([1-9]\d*)"/>
<pattern id="currencyCode" regex="[A-Z]{3}"/>
<pattern id="timeZoneId" regex="[a-zA-Z/]+"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a more restricted grammar that only allows valid IANA timezone IDs, but I don't feel very strongly about it. Exhibit A: https://tc39.es/proposal-temporal/#prod-TimeZoneIANAName

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current registry spec does not allow us to do this, only regex and list of values.

I suggested changing the spec to allow for URLs to external specs.
Added it as a discussion point in #422

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Temporal polyfill includes a Regexp that works for this purpose, but unfortunately it isn't a literal but I imagine that if we agreed on having a restrictive regex here that only allows valid IANA tzids, then we could come up with a simplified regex that works in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in the longer run (not first commit) we should also add support for Unicode Time Zone Identifiers

The main reason would be stability, which the Olson database IDs does not offer (see linked)

<pattern id="anythingNotEmpty" regex=".+"/>
<pattern id="iso8601" regex="\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}"/>

<function name="datetime">
<!-- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DateTimeFormat/DateTimeFormat -->
<description>Locale-sensitive date and time formatting</description>

<formatSignature>
<input pattern="iso8601"/>
<!-- The predefined date formatting style to use. -->
<option name="dateStyle" values="full long medium short"/>
<!-- The predefined time formatting style to use. -->
<option name="timeStyle" values="full long medium short"/>
<!-- Calendar to use. -->
<option name="calendar" values="buddhist chinese coptic dangi ethioaa ethiopic gregory hebrew indian islamic islamic-umalqura islamic-tbla islamic-civil islamic-rgsa iso8601 japanese persian roc"/>
<!-- Numbering system to use. -->
<option name="numberingSystem" values="arab arabext bali beng deva fullwide gujr guru hanidec khmr knda laoo latn limb mlym mong mymr orya tamldec telu thai tibt"/>
<!-- The time zone to use. The only value implementations must recognize
is "UTC"; the default is the runtime's default time zone.
Implementations may also recognize the time zone names of the IANA
time zone database, such as "Asia/Shanghai", "Asia/Kolkata",
"America/New_York".
-->
<option name="timeZone" pattern="timeZoneId"/>
Comment on lines +26 to +32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget offset time zones.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is 100% what ECMA says:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DateTimeFormat/DateTimeFormat#timezone

I tried to not include anything that is not currently supported by ECMAScript, as we know there is strong opposition to that.

I'm for it, happy to discuss, but I didn't change this xml.

Added a topic in #422

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the function accepts as validated input. It doesn't mean that every possible value is supported

</formatSignature>

<!-- TODO: clarify if this is OK or if it is an abuse.
The intention is to show that dateStyle / timeStyle and the other
options are conflicting, you can use either / or, but not both.
-->
<formatSignature>
<input pattern="iso8601"/>
<!-- Calendar to use. -->
<option name="calendar" values="buddhist chinese coptic dangi ethioaa ethiopic gregory hebrew indian islamic islamic-umalqura islamic-tbla islamic-civil islamic-rgsa iso8601 japanese persian roc"/>
<!-- The formatting style used for day periods like "in the morning", "am", "noon", "n" etc. -->
<option name="dayPeriod" values="narrow short long"/>
<!-- Numbering system to use. -->
<option name="numberingSystem" values="arab arabext bali beng deva fullwide gujr guru hanidec khmr knda laoo latn limb mlym mong mymr orya tamldec telu thai tibt"/>
<!-- The time zone to use. The only value implementations must recognize
is "UTC"; the default is the runtime's default time zone.
Implementations may also recognize the time zone names of the IANA time zone
database, such as "Asia/Shanghai", "Asia/Kolkata", "America/New_York".
-->
<option name="timeZone" pattern="timeZoneId"/>
<!-- The hour cycle to use. -->
<option name="hourCycle" values="h11 h12 h23 h24"/>
<!-- The representation of the weekday. -->
<option name="weekday" values="long short narrow"/>
<!-- The representation of the era. -->
<option name="era" values="long short narrow"/>
<!-- The representation of the year. -->
<option name="year" values="numeric 2-digit"/>
<!-- The representation of the month. -->
<option name="month" values="numeric 2-digit long short narrow"/>
<!-- The representation of the day. -->
<option name="day" values="numeric 2-digit"/>
<!-- The representation of the hour. -->
<option name="hour" values="numeric 2-digit"/>
<!-- The representation of the minute. -->
<option name="minute" values="numeric 2-digit"/>
<!-- The representation of the second. -->
<option name="second" values="numeric 2-digit"/>
<!-- The number of digits used to represent fractions of a second
(any additional digits are truncated). -->
<option name="fractionalSecondDigits" values="1 2 3"/>
<!-- The localized representation of the time zone name. -->
<option name="timeZoneName" values="long short shortOffset longOffset shortGeneric longGeneric"/>
</formatSignature>

</function>

<function name="number">
<!-- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/NumberFormat/NumberFormat -->
<description>Locale-sensitive number formatting</description>

<formatSignature>
<input pattern="anyNumber"/>
<!-- Only used when notation is "compact". -->
<option name="compactDisplay" values="short long" default="short"/>
<!-- The currency to use in currency formatting.
Possible values are the ISO 4217 currency codes, such as "USD" for the US dollar,
"EUR" for the euro, or "CNY" for the Chinese RMB — see the
Current currency & funds code list
(https://www.six-group.com/en/products-services/financial-information/data-standards.html#scrollTo=currency-codes).
There is no default value; if the style is "currency", the currency property must be provided.
-->
<option name="currency" pattern="currencyCode"/>
<!-- How to display the currency in currency formatting. -->
<option name="currencyDisplay" values="symbol narrowSymbol code name" default="symbol"/>
<!-- In many locales, accounting format means to wrap the number with parentheses
instead of appending a minus sign. You can enable this formatting by setting the
currencySign option to "accounting".
-->
<option name="currencySign" values="accounting standard" default="standard"/>
<!-- The formatting that should be displayed for the number. -->
<option name="notation" values="standard scientific engineering compact" default="standard"/>
<!-- Numbering system to use. -->
<option name="numberingSystem" values="arab arabext bali beng deva fullwide gujr guru hanidec khmr knda laoo latn limb mlym mong mymr orya tamldec telu thai tibt"/>
<!-- When to display the sign for the number. -->
<!-- "negative" value is Experimental. -->
<option name="signDisplay" values="auto always exceptZero never" default="auto"/>
<!-- The formatting style to use. -->
<option name="style" values="decimal currency percent unit" default="decimal"/>
Comment on lines +110 to +111
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To consider: If we were to leave out currency and unit formatting from the default, then we wouldn't need to say how compound units need to work.

<!-- The unit to use in unit formatting.
Possible values are core unit identifiers, defined in UTS #35, Part 2, Section 6.
A subset of units from the full list was selected for use in ECMAScript.
Pairs of simple units can be concatenated with "-per-" to make a compound unit.
There is no default value; if the style is "unit", the unit property must be provided.
-->
<option name="unit" pattern="anythingNotEmpty"/>
<!-- The unit formatting style to use in unit formatting. -->
<option name="unitDisplay" values="long short narrow" default="short"/>
<!-- The minimum number of integer digits to use.
A value with a smaller number of integer digits than this number will be
left-padded with zeros (to the specified length) when formatted.
-->
<option name="minimumIntegerDigits" values="positiveInteger" default="1"/>
<!-- The minimum number of fraction digits to use.
The default for plain number and percent formatting is 0;
the default for currency formatting is the number of minor unit digits provided by
the ISO 4217 currency code list (2 if the list doesn't provide that information).
-->
<option name="minimumFractionDigits" values="positiveInteger"/>
<!-- The maximum number of fraction digits to use.
The default for plain number formatting is the larger of minimumFractionDigits and 3;
the default for currency formatting is the larger of minimumFractionDigits and the number of minor
unit digits provided by the ISO 4217 currency code list (2 if the list doesn't provide that information);
the default for percent formatting is the larger of minimumFractionDigits and 0.
-->
<option name="maximumFractionDigits" values="positiveInteger"/>
<!-- The minimum number of significant digits to use. -->
<option name="minimumSignificantDigits" values="positiveInteger" default="1"/>
<!-- The maximum number of significant digits to use. -->
<option name="maximumSignificantDigits" values="positiveInteger" default="21"/>
</formatSignature>

</function>

</registry>