-
Notifications
You must be signed in to change notification settings - Fork 217
Description
ICU4X collation 2.0.0 shows 5 tests failing (only!)
Note also that these tests pass in ICU4C, ICU4J, and NodeJS implementations of collation.
Each of these indicates locale 'root' and the test strings contain U+FFFE in combination with other characters. This seems to be an error in ICU4X collation. Note that earlier versions of ICU4X (1.3, 1.4, etc.) also have this set of failures.
The source for the collation tests is this file testgen/icu77/collationtest.txt, starting at line 360. This test file is used by ICU4C internal testing.
** test: U+FFFE on identical level
@ root
% strength=identical
* compare
# All of these control codes are completely-ignorable, so that
# their low code points are compared with the merge separator.
# The merge separator must compare less than any other character.
<1 \uFFFE\u0001\u0002\u0003
<i \u0001\uFFFE\u0002\u0003
<i \u0001\u0002\uFFFE\u0003
<i \u0001\u0002\u0003\uFFFE
* compare
# The merge separator must even compare less than U+0000.
<1 \uFFFE\u0000\u0000
<i \u0000\uFFFE\u0000
<i \u0000\u0000\uFFFE
That is processed by collation.py, producing a JSON file. Here are the JSON data for these 5 failures with labels "00160" to "00165" below. Note that "00163" passes, but that one ijust compares with s1 which is the empty string.
{
"test_type": "collation",
"tests": [
...
{
"compare_type": "<i",
"s1": "\ufffe\u0001\u0002\u0003",
"s2": "\u0001\ufffe\u0002\u0003",
"source_file": "collationtest.txt",
"line": 367,
"label": "00160",
"locale": "root",
"test_description": "U+FFFE on identical level",
"strength": "identical",
"hexhash": "8a6ec2612f6466afc149ca9b899ae40ab1a9dcf4"
},
{
"compare_type": "<i",
"s1": "\u0001\ufffe\u0002\u0003",
"s2": "\u0001\u0002\ufffe\u0003",
"source_file": "collationtest.txt",
"line": 368,
"label": "00161",
"locale": "root",
"test_description": "U+FFFE on identical level",
"strength": "identical",
"hexhash": "82c664591829f157424009bb79f5bace7b3f24d1"
},
{
"compare_type": "<i",
"s1": "\u0001\u0002\ufffe\u0003",
"s2": "\u0001\u0002\u0003\ufffe",
"source_file": "collationtest.txt",
"line": 369,
"label": "00162",
"locale": "root",
"test_description": "U+FFFE on identical level",
"strength": "identical",
"hexhash": "2906513e52bfe5dc9c9bad0043aada9bc7796048"
},
{
"compare_type": "<1",
"s1": "",
"s2": "\ufffe\u0000\u0000",
"source_file": "collationtest.txt",
"line": 373,
"label": "00163",
"locale": "root",
"test_description": "U+FFFE on identical level",
"strength": "identical",
"hexhash": "a46eaaedfee83fd8c28188dfc4b57680265bec51"
},
{
"compare_type": "<i",
"s1": "\ufffe\u0000\u0000",
"s2": "\u0000\ufffe\u0000",
"source_file": "collationtest.txt",
"line": 374,
"label": "00164",
"locale": "root",
"test_description": "U+FFFE on identical level",
"strength": "identical",
"hexhash": "f37df13ebf22bc59171cdeb29e8d0af2d68cbdca"
},
{
"compare_type": "<i",
"s1": "\u0000\ufffe\u0000",
"s2": "\u0000\u0000\ufffe",
"source_file": "collationtest.txt",
"line": 375,
"label": "00165",
"locale": "root",
"test_description": "U+FFFE on identical level",
"strength": "identical",
"hexhash": "7779f4509820e87810bcbd78eb53e4958f330361"
},
...
]