Skip to content

One symbol table to rule them all #1080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Apr 2, 2025

Conversation

eggrobin
Copy link
Member

Clearly our problem of proliferation of symbol tables will be solved by adding yet another one; so far not used, and tested against examples from the working draft UnicodeSet spec.

This should be able to replace most of the existing symbol tables as-is. It needs some tweaking before I can use it in the invariant tests, so that it can use the versioned ToolUnicodeProperties1 in some circumstances.

Towards #1073 and #1074.

Footnotes

  1. Yes, those ones: Retire ToolUnicodeProperties & UCD.java #639. We’re not retiring them just yet…

@eggrobin eggrobin requested a review from markusicu March 31, 2025 08:49
@markusicu
Copy link
Member

https://xkcd.com/927/

@markusicu
Copy link
Member

4 failing CI checks

// Either unary-property-query, or binary-property-query with an empty property-value.
final var script = queriedProperties.getProperty(UcdProperty.Script);
final var generalCategory = queriedProperties.getProperty(UcdProperty.General_Category);
if (script.isValidValue(unqualifiedLeftHandSide.toString())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of .toString() calls here. please make a String after parsing the version out of leftHandSide. if there was no version prefix, then the leftHandSide itself is already the String you want.

probably easiest to do something like

if (there was a version prefix) {
    leftHandSide = unqualifiedLeftHandSide.toString();
}
... use leftHandSide ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but kept the long name because I think I will need to look at the original leftHandSide in a future revision.

final var queriedProperties = IndexUnicodeProperties.make(deducedQueriedVersion);

if (propertyPredicate.length() == 0) {
// Either unary-property-query, or binary-property-query with an empty property-value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should first check if leftHandSide is a valid property name, and do a normal prop=value lookup, before trying the script and gc shortcuts. Except for the isc=C issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I probably want to do that once I can deal with the isc=C issue by looking at whether there is an =.

This should be possible by bumping the ICU dependency to a snapshot past unicode-org/icu#3456, but I will look at that in a subsequent PR.

@eggrobin eggrobin requested a review from markusicu April 1, 2025 19:26
@eggrobin
Copy link
Member Author

eggrobin commented Apr 1, 2025

Note: The failing Python thing is #1081.

Comment on lines 463 to 465
return getSet(
propertyValue == null
? NULL_MATCHER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider checking for null and returning getSet(null matcher) before determining a comparator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This made me think that this does a tiny bit of value validation (checking that the delimiter is not in the query), and made me wonder whether it should do more (in particular, null is valid only for some property types).

Probably a question for another day and another PR.

@eggrobin eggrobin requested a review from markusicu April 1, 2025 22:54
@eggrobin eggrobin merged commit cc73c4e into unicode-org:main Apr 2, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants