Skip to content

Segmentation test cases are being generated based on data for the previous version of Unicode #1072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eggrobin opened this issue Mar 21, 2025 · 3 comments · Fixed by #1075
Closed
Assignees

Comments

@eggrobin
Copy link
Member

eggrobin commented Mar 21, 2025

See the Unicode mailing list email titled Question Regarding UCD Draft Files and GraphemeBreakTest Discrepancy.

@eggrobin eggrobin self-assigned this Mar 22, 2025
@eggrobin
Copy link
Member Author

eggrobin commented Mar 22, 2025

I understand the issue. Segmenter.java parses its UnicodeSets using IndexUnicodeProperties.make().getXSymbolTable(), which constructs a plain UnicodeProperty.MyXSymbolTable. That one implements a very minimalistic applyPropertyAlias, which does not support what I call a unary-query-expression in my draft specification. Worse, it fails silently (returning false), which means that we then use ICU properties. Thus \p{Extended_Pictographic} is computed by ICU, with 16.0 data.

For a binary-query-expression, the MyXSymbolTable works as intended and does not fall back to ICU, which is why \p{lb=HH} (which ICU would reject) worked fine in #1046.

The root cause is that there are no fewer than six class .* extends .*XSymbolTable in this codebase, which is around five too many. MySymbolTable in UnicodeSetUtilities.java is probably the least broken of the lot (I have been fixing some corners of it lately). See also:

@markusicu
Copy link
Member

it fails silently (returning false), which means that we then use ICU properties

Probably stating the obvious:
In the context of the Unicode Tools, we should never "fall back" to ICU properties, especially not silently.
Use of ICU properties should be limited to consistency testing and comparison tools, if any.

@macchiati
Copy link
Member

macchiati commented Mar 22, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants