-
Notifications
You must be signed in to change notification settings - Fork 400
CLDR-18889 v48 BRS generate algorithmic #4961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-18889 v48 BRS generate algorithmic #4961
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
- I would update the docs to recommend running this tool after generating algorithmic, but the docs aren't migrated over <https://unicode-org.atlassian.net/browse/CLDR-18606?focusedCommentId=186098> - TestLocale doesn't look like it is a useful test, filed https://unicode-org.atlassian.net/browse/CLDR-18890
- add @cdata tag to request CDATA output - update transformer to skip rbnfRules for DAIP - update XML generator
7ddf2a2
to
b679fcd
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
had to redo this. Algorithmic transforms are cumulative so I can't just re-run it. |
Help? what is this error?
|
yue_Hans which is default CN, requires H as the preferred: cldr/common/supplemental/supplementalData.xml Line 4788 in 649eb5d
I assume that yue_Hans is being generated with h formats since HK prefers h: cldr/common/supplemental/supplementalData.xml Line 4798 in 649eb5d
Nice to see the error working :) |
The transform is from <likelySubtag from="yue" to="yue_Hant_HK"/>
<likelySubtag from="yue_CN" to="yue_Hans_CN"/>
<likelySubtag from="yue_Hans" to="yue_Hans_CN"/> |
common/main/yue_Hans.xml
Outdated
<pattern>HH:mm:ss [zzzz]</pattern> | ||
<datetimeSkeleton>HHmmssz</datetimeSkeleton> | ||
</timeFormat> | ||
</timeFormatLength> | ||
<timeFormatLength type="long"> | ||
<timeFormat> | ||
<pattern>ah:mm:ss [z]</pattern> | ||
<datetimeSkeleton>ahmmssz</datetimeSkeleton> | ||
<pattern>HH:mm:ss [z]</pattern> | ||
<datetimeSkeleton>HHmmssz</datetimeSkeleton> | ||
</timeFormat> | ||
</timeFormatLength> | ||
<timeFormatLength type="medium"> | ||
<timeFormat> | ||
<pattern>ah:mm:ss</pattern> | ||
<datetimeSkeleton>ahmmss</datetimeSkeleton> | ||
<pattern>HH:mm:ss</pattern> | ||
<datetimeSkeleton>HHmmss</datetimeSkeleton> | ||
</timeFormat> | ||
</timeFormatLength> | ||
<timeFormatLength type="short"> | ||
<timeFormat> | ||
<pattern>ah:mm</pattern> | ||
<datetimeSkeleton>ahmm</datetimeSkeleton> | ||
<pattern>HH:mm</pattern> | ||
<datetimeSkeleton>HHmm</datetimeSkeleton> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, here are the 'promotions' I did changing ah
to HH
. Is the dateTimeSkeleton correct?
looks like tests pass |
Which docs aren't migrated? |
You migrated them since I wrote that and it was updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions
@@ -168,7 +168,7 @@ x.x: << Komma >>; | |||
1: =%spellout-ordinal=; | |||
%%ste2: | |||
0: ste; | |||
1: ‘ =%spellout-ordinal=; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replacement of curly by straight apostrophe is suspicious. I don't think that should be in the transform, or otherwise be special-cased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure. Perhaps it's an effect of daip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The straight quote is correct. The curly quote is incorrect. The straight quote means quote the space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous data was bad. 79b4e8b#diff-67161b041fe0ebebee961182647df51363f1264afbde64c57c894239b17f1c37L180
This bad curly quote was introduced here in 2021: d4cb32a#diff-67161b041fe0ebebee961182647df51363f1264afbde64c57c894239b17f1c37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, this line change seems correct to me.
@@ -209,221 +209,10 @@ x.x: =#,##0.#=; | |||
%spellout-ordinal-m: | |||
-x: minus >>; | |||
x.x: =#,##0.#=; | |||
0: =%spellout-ordinal=m; | |||
]]></rbnfRules> | |||
<!-- The following redundant ruleset elements have been deprecated and will be removed in the next release. Please use the rbnfRules contents instead. --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are leaving in the old rules for one release, so they shouldn't disappear. Also, very odd to keep just one ("30").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because of MINIMIZE rules. If transformed = original, it drops the value.
'30' has dreissig
which is transformed from dreißig
- that's the only modified item, the rest get dropped.
@@ -12,324 +12,28 @@ CLDR data files are interpreted according to the LDML specification (http://unic | |||
<territory type="CH"/> | |||
</identity> | |||
<annotations> | |||
<annotation cp="{">↑↑↑</annotation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like all of the ↑↑↑ lines are disappearing. That's ok; I'm a little surprised that they were in the last release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a separate bug that was fixed.
@@ -115,10 +115,10 @@ x.x: << koma >>; | |||
1000000000000000000: =#,##0=; | |||
%%ordi: | |||
0: i; | |||
1: ‘ i =%spellout-ordinal=; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, odd that ‘ is being converted to '
tools/cldr-code/src/main/java/org/unicode/cldr/tool/CLDRFileTransformer.java
Show resolved
Hide resolved
Co-authored-by: Mark Davis <mark@unicode.org>
@@ -115,10 +115,10 @@ x.x: << koma >>; | |||
1000000000000000000: =#,##0=; | |||
%%ordi: | |||
0: i; | |||
1: ‘ i =%spellout-ordinal=; | |||
1: ' i =%spellout-ordinal=; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know where the left hand side came from - maybe by hand.
However, this line is actually converted from:
Lines 117 to 118 in 6a57fe5
1: ' и =%spellout-ordinal=; %%ordti:
which shows '
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The straight quote is correct. The curly quote is incorrect. This error was also introduced in this commit from 2021: d4cb32a#diff-15b116f0c5be91a93b36df309a3a5f8d78b7bc8c0d2de99d84c2b8c083f7d133
This change seems correct to me.
@@ -168,7 +168,7 @@ x.x: << Komma >>; | |||
1: =%spellout-ordinal=; | |||
%%ste2: | |||
0: ste; | |||
1: ‘ =%spellout-ordinal=; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@macchiati the U+2018 |
I re-ran cldrmodify no arg and |
OK, i see the issue with the old style rules being dropped. |
- retain rbnfrules even with MINIMIZE - add CLI options for partial generation
Does (did) If it does block, then this PR is optimal as is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving so we can get the bulk of the data in. We can look at the ' ‘ issue later.
Filed https://unicode-org.atlassian.net/browse/CLDR-18926 to track. |
This brings up a possible future problem. If we apply the script transforms
to the rules *as is*, then there is the possibility of their changing
*syntax* characters as well as the *literal text*. Probably the right thing
for the future is to apply the transform line-by-line, but also parse the
line to only apply to literal text. That is, in
1: ' и =%spellout-ordinal=;
it would only apply to the underlined substring below.
1: ' *и* =%spellout-ordinal=;
Steven, can you file a follow-up ticket to do that?
…On Tue, Sep 2, 2025 at 11:56 AM George Rhoten ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In common/rbnf/sr_Latn.xml
<#4961 (comment)>:
> @@ -115,10 +115,10 @@ x.x: << koma >>;
1000000000000000000: =#,##0=;
%%ordi:
0: i;
-1: ‘ i =%spellout-ordinal=;
+1: ' i =%spellout-ordinal=;
The straight quote is correct. The curly quote is incorrect. This error
was also introduced in this commit from 2021: d4cb32a
#diff-15b116f0c5be91a93b36df309a3a5f8d78b7bc8c0d2de99d84c2b8c083f7d133
<d4cb32a#diff-15b116f0c5be91a93b36df309a3a5f8d78b7bc8c0d2de99d84c2b8c083f7d133>
This change seems correct to me.
—
Reply to this email directly, view it on GitHub
<#4961 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMCBIVVDOH3KJRP2SND3QXR5NAVCNFSM6AAAAACDXXZ4H6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTCNZXG42TANBTG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
CLDR-18889
That’s all folks.
regenerate GenerateLocaleIDTestData
I would update the docs to recommend running this tool after generating algorithmic, but the docs aren't migrated over https://unicode-org.atlassian.net/browse/CLDR-18606?focusedCommentId=186098 update I did update the docs.
TestLocale doesn't look like it is a useful test, filed https://unicode-org.atlassian.net/browse/CLDR-18890 - but until then, I updated the test to suggest how to correct the problem.
adds a
@CDATA
tag to rbnf - CLDR-8909 - noting that the element is expected to be a CDATA elementALLOW_MANY_COMMITS=true