Skip to content

CheckProperties exception for RGI_Emoji_Flag_Sequence #540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
markusicu opened this issue Sep 19, 2023 · 8 comments
Open

CheckProperties exception for RGI_Emoji_Flag_Sequence #540

markusicu opened this issue Sep 19, 2023 · 8 comments

Comments

@markusicu
Copy link
Member

markusicu commented Sep 19, 2023

When I run the unicodetools "CheckProperties" I see an exception getting thrown and printed, but the program continues and exits without a failure code.

Is this something to worry about?

Comment from KenW: The only difference I am seeing between 15.0 and 15.1 is that the flag of Turkey ended up described as "flag: Türkiye"

Thoughts from Markus:

  • If there is a problem, then we should fix it.
  • If there is no problem, then it should not throw an exception.
  • CheckProperties seems to swallow the exception, and finishes with exit code 0. This could hide real problems.
RGI_Emoji_Flag_Sequence 15.1.0.0
com.ibm.icu.util.ICUException: RGI_Emoji_Flag_Sequence( from: /usr/local/google/home/mscherer/unitools/mine/src/unicodetools/data/emoji/15.1/emoji-sequences.txt)
at org.unicode.props.IndexUnicodeProperties.load(IndexUnicodeProperties.java:446)
at org.unicode.propstest.CheckProperties.compare(CheckProperties.java:662)
at org.unicode.propstest.CheckProperties.main(CheckProperties.java:219)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.unicode.props.UnicodePropertyException: Key already present in Map: Basic_Emoji, old: 0000-2319=No
231A-231B=Yes
231C-23E8=No
23E9-23EC=Yes
...
1F6F0,FE0F=Yes
1F6F3,FE0F=Yes
, new:
at org.unicode.props.PropertyUtilities.putNew(PropertyUtilities.java:33)
at org.unicode.props.PropertyParsingInfo.parseSourceFile(PropertyParsingInfo.java:465)
at org.unicode.props.IndexUnicodeProperties.load(IndexUnicodeProperties.java:443)
... 8 more

@nedley @kenlunde @macchiati

@nedley
Copy link
Contributor

nedley commented Sep 17, 2024

Running the tool just now gives me the same exception but for a different property:

Modifier_Combining_Mark	15.1.0.0
com.ibm.icu.util.ICUException: Modifier_Combining_Mark( from: /Volumes/Common/projects/unicode/unicodetools/unicodetools/data/ucd/15.1.0/PropList.txt)
	at org.unicode.props.IndexUnicodeProperties.load(IndexUnicodeProperties.java:492)
	at org.unicode.props.IndexUnicodeProperties.load(IndexUnicodeProperties.java:454)
	at org.unicode.propstest.CheckProperties.compare(CheckProperties.java:656)
	at org.unicode.propstest.CheckProperties.main(CheckProperties.java:219)
Caused by: org.unicode.props.UnicodePropertyException: Key already present in Map: Logical_Order_Exception,	old: 0000-0E3F=No
0E40-0E44=Yes
0E45-0EBF=No
0EC0-0EC4=Yes
0EC5-19B4=No
19B5-19B7=Yes
19B8-19B9=No
19BA=Yes
19BB-AAB4=No
AAB5-AAB6=Yes
AAB7-AAB8=No
AAB9=Yes
AABA=No
AABB-AABC=Yes
AABD-10FFFF=No
,	new: 
	at org.unicode.props.PropertyUtilities.putNew(PropertyUtilities.java:34)
	at org.unicode.props.PropertyParsingInfo.parseSourceFile(PropertyParsingInfo.java:501)
	at org.unicode.props.IndexUnicodeProperties.load(IndexUnicodeProperties.java:488)
	... 3 more

@nedley nedley removed their assignment Sep 17, 2024
@eggrobin
Copy link
Member

eggrobin commented Apr 3, 2025

Thoughts from Markus:
CheckProperties seems to swallow the exception, and finishes with exit code 0. This could hide real problems.

CheckProperties was turned into a CI step by #253 for #187. But it was never written as a CI test; it was a tool you ran and whose output you read.

The documentation says

Run /unicodetools/org/unicode/propstest/CheckProperties.java to see what changed
and sanity check.

Fix any failures, such as: [an exception]

This means that we are not testing whatever was only tested by that, and I do not think we are manually running this tool and looking at its output anymore either.

And I think the validity of values in data files is one of those things; this would explain the Arabic_Tashkil problem noted by @jowilco.

@jowilco
Copy link
Contributor

jowilco commented Apr 3, 2025

Missing enum values are being caught by org.unicode.props.PropertyParsingInfo#checkEnum. The errors are stored in DATA_LOADING_ERRORS, but the only place that DATA_LOADING_ERRORS is being read is in CheckProperties.

@eggrobin
Copy link
Member

eggrobin commented Apr 3, 2025

Indeed.

Which would fail, if it were a thing that could fail. In fact it fails so much (because of unmaintained regexes) that the checkEnum failure isn’t actually printed (but it occurs, see the CI failure on the second commit in #1083).

What a mess.

I think we should change checkEnum to throw directly, and deal with the rest of the CheckProperties errors another day—the regexes are known to be out of date, for instance.

@jowilco, @markusicu, what do you think?

@jowilco
Copy link
Contributor

jowilco commented Apr 3, 2025

That works for me, but I'm not sure that I know enough to make the call.

@markusicu
Copy link
Member Author

I think we should change checkEnum to throw directly, and deal with the rest of the CheckProperties errors another day—the regexes are known to be out of date, for instance.

@jowilco, @markusicu, what do you think?

sgtm
changing checkEnum() to throw, and fixing what then fails, also sounds painful...

@eggrobin
Copy link
Member

eggrobin commented Apr 7, 2025

changing checkEnum() to throw, and fixing what then fails, also sounds painful...

It is not too bad for the current version.
On older versions it is a different story (today I learned about CCC=37 for U+0901 until Unicode 3.0…)

@markusicu
Copy link
Member Author

(today I learned about CCC=37 for U+0901 until Unicode 3.0…)

I got curious about that. That went away in Unicode 2.1.8:

In those days, many things about Unicode were not stable that have long since been stabilized. I believe Unicode normalization was stabilized in 3.0, modulo the NormalizationCorrections until 4.1 or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants