-
-
Notifications
You must be signed in to change notification settings - Fork 44
Correctly parse Numeric_Value since 1.0 #1123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing the history display code. Do not show version ranges where we do not have version data.
@@ -227,6 +238,17 @@ public String getFileName(VersionInfo ucdVersionRequested) { | |||
} | |||
} | |||
|
|||
public int getFieldNumber(VersionInfo ucdVersionRequested) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we optimize this a little?
For example, if fieldNumbers.size() == 1 then just return the one and only value.
Maybe even better, once the parsing info is all built, populate an int singleFieldNumber
with the one and only value if there is exactly one. If there are two or more, then set that to -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if fieldNumbers.size() == 1 then just return the one and only value.
Sure. Done.
Maybe even better, once the parsing info is all built, populate an int singleFieldNumber with the one and only value if there is exactly one. If there are two or more, then set that to -1.
That is probably not the direction this wants to go in. This is clearly in dire need of refactoring, because this logic of version-dependent chunk of parsing info is now triplicated (we do this for field number, file, default value). This should just get transposed into a type that has (field number, file, default value, whatever else we want to make version-dependent), and then a map keyed on version. I don’t think we have much to gain from making the map sparse (the maps are sparse here because they match contents of the source files, which is rarely a good guide for data structures).
But that refactoring isn’t going to happen today.
Fix #1118.
When it exists, use DerivedNumericValues to tell us:
But use UnicodeData as the source of truth if available. This is preferable because of the lack of star in https://www.unicode.org/reports/tr44/#Derived_Extracted, but also because it is the only thing that works in the past.
Before DerivedNumericValues, use UnicodeData alone (at that time, Unihan did not contribute to Numeric_Value, so this is correct).
In addition, this allows us to read the data we would want to write back (such as 2/12 for U+109F7), which seems like the right way to go in the future. Right now we are reverse-engineering that from the name when generating UnicodeData, which is very silly.
unicodetools/unicodetools/src/main/java/org/unicode/text/UCD/MakeUnicodeFiles.java
Lines 2220 to 2257 in c2c5ba9