-
-
Notifications
You must be signed in to change notification settings - Fork 44
One symbol table to rule them all #1080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One symbol table to rule them all #1080
Conversation
4 failing CI checks |
unicodetools/src/main/java/org/unicode/props/UnicodeProperty.java
Outdated
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/props/UnicodeProperty.java
Outdated
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/text/UCD/VersionedSymbolTable.java
Outdated
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/text/UCD/VersionedSymbolTable.java
Outdated
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/text/UCD/VersionedSymbolTable.java
Outdated
Show resolved
Hide resolved
// Either unary-property-query, or binary-property-query with an empty property-value. | ||
final var script = queriedProperties.getProperty(UcdProperty.Script); | ||
final var generalCategory = queriedProperties.getProperty(UcdProperty.General_Category); | ||
if (script.isValidValue(unqualifiedLeftHandSide.toString())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lots of .toString() calls here. please make a String after parsing the version out of leftHandSide. if there was no version prefix, then the leftHandSide itself is already the String you want.
probably easiest to do something like
if (there was a version prefix) {
leftHandSide = unqualifiedLeftHandSide.toString();
}
... use leftHandSide ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but kept the long name because I think I will need to look at the original leftHandSide in a future revision.
final var queriedProperties = IndexUnicodeProperties.make(deducedQueriedVersion); | ||
|
||
if (propertyPredicate.length() == 0) { | ||
// Either unary-property-query, or binary-property-query with an empty property-value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should first check if leftHandSide is a valid property name, and do a normal prop=value lookup, before trying the script and gc shortcuts. Except for the isc=C issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I probably want to do that once I can deal with the isc=C issue by looking at whether there is an =.
This should be possible by bumping the ICU dependency to a snapshot past unicode-org/icu#3456, but I will look at that in a subsequent PR.
unicodetools/src/main/java/org/unicode/text/UCD/VersionedSymbolTable.java
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/text/UCD/VersionedSymbolTable.java
Show resolved
Hide resolved
unicodetools/src/test/java/org/unicode/text/UCD/TestVersionedSymbolTable.java
Outdated
Show resolved
Hide resolved
Note: The failing Python thing is #1081. |
return getSet( | ||
propertyValue == null | ||
? NULL_MATCHER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider checking for null and returning getSet(null matcher) before determining a comparator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This made me think that this does a tiny bit of value validation (checking that the delimiter is not in the query), and made me wonder whether it should do more (in particular, null is valid only for some property types).
Probably a question for another day and another PR.
Clearly our problem of proliferation of symbol tables will be solved by adding yet another one; so far not used, and tested against examples from the working draft UnicodeSet spec.
This should be able to replace most of the existing symbol tables as-is. It needs some tweaking before I can use it in the invariant tests, so that it can use the versioned ToolUnicodeProperties1 in some circumstances.
Towards #1073 and #1074.
Footnotes
Yes, those ones: Retire ToolUnicodeProperties & UCD.java #639. We’re not retiring them just yet… ↩