Skip to content

there can be only one UnicodeProperty.java #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
markusicu opened this issue May 29, 2021 · 18 comments
Open

there can be only one UnicodeProperty.java #66

markusicu opened this issue May 29, 2021 · 18 comments

Comments

@markusicu
Copy link
Member

markusicu commented May 29, 2021

Background: ucd-dev email thread “Unicode Tools vs. UnicodeProperty.java

We have two versions of org/unicode/cldr/util/props/UnicodeProperty.java, one in CLDR and one in the Unicode Tools.
Until 1.5 years ago they were identical, but they have been diverging and are now no longer even interface-compatible. The CLDR version is the newer one.

For now, I am adding a note to the setup instructions to move cldr-code above unicodetools in the build path, but of course that's evil and brittle.

We had agreed to move UnicodeProperty from CLDR into the Unicode Tools, but other CLDR code depends on this class, and I am not sure if I have time to get through that.
Maybe I just rename the unused Unicode Tools version at first.

There is also a third version of UnicodeProperty.java, in the UnicodeJsps code, but thankfully it is in a different package. Still, this is confusing and error-prone; we should either merge the classes or give them distinct names.

@srl295
Copy link
Member

srl295 commented Oct 1, 2021

@macchiati
Copy link
Member

macchiati commented Oct 1, 2021 via email

@srl295
Copy link
Member

srl295 commented Oct 4, 2021

@josh-hadley
Copy link
Collaborator

As @srl295 posted, I have performed a basic analysis of duplicated .java files as a starting point. I attempted a very basic diff to try to triage a bit but I think that was not terribly useful with the multi-file overlaps. Sadly, looks like only one duplicated file is identical.

So I think the next step will be to look more closely into the overlaps and determine commonalities/differences inside and figure out what can be re-worked, removed, etc.

@srl295
Copy link
Member

srl295 commented Nov 9, 2021

Now that #148 (#131) is merged, the JSPs can easily call into unicodetools librarycode.

@srl295
Copy link
Member

srl295 commented Mar 23, 2022

this bit me trying to do #186 (IDNA)— because of the order of loading, ICU4J's UnicodeSet ends up calling into unicodetools' UnicodeProperty and crashes.

@macchiati
Copy link
Member

macchiati commented Mar 23, 2022 via email

@srl295
Copy link
Member

srl295 commented Mar 23, 2022

@macchiati I'm going to try moving unicodetools' version into org.unicode.props …

@srl295
Copy link
Member

srl295 commented Mar 23, 2022

Back burner on this due to CLDR release, but just as an example of the layering problems:

private static void doCompare(UnicodeProperty.Factory factory, Segmenter rl, String line) {
RandomStringGenerator rsg;
RuleBasedBreakIterator icuBreak;
if (line.equals("compareGrapheme")) {
rsg = new RandomStringGenerator(factory, "GraphemeClusterBreak");

    private static void doCompare(UnicodeProperty.Factory factory, Segmenter rl, String line) {
        RandomStringGenerator rsg;
        RuleBasedBreakIterator icuBreak;
        if (line.equals("compareGrapheme")) {
            rsg = new RandomStringGenerator(factory, "GraphemeClusterBreak");

So here we are calling into CLDR (RandomStringGenerator) but passing it unicodetools' version of UnicodeProperty.Factory. The only way this “works” is basically that UnicodeProperty.class from CLDR gets replaced at runtime with unicodetools' … not good

@markusicu
Copy link
Member Author

It looks like CLDR RandomStringGenerator has other problems, too. It sounds generic, but its init() function is specific to working with the GCB property. And when you call the (Factory, String, bool, bool) constructor, you get some data from the input factory and other data from the ICUPropertyFactory.

If this is used from CLDR code, then I would clone it into Unicode Tools (with a different package and modified class name). If it's not used from CLDR code, then move it into Unicode Tools.

It might also be possible to pull all of the factory calls out into the call site and construct this generator only with UnicodeSet/UnicodeMap/String/boolean values.

srl295 added a commit that referenced this issue Mar 24, 2022
- also move some code out of UnicodeJsps into org.unicode.jsp

Relates to #66
srl295 added a commit that referenced this issue Mar 24, 2022
…ackage

- also move some code out of UnicodeJsps into org.unicode.jsp

Relates to #66
srl295 added a commit that referenced this issue Mar 24, 2022
- deprecate org.unicode.jsp.UnicodeProperty

Relates to #66
srl295 added a commit that referenced this issue Mar 24, 2022
- deprecate org.unicode.jsp.UnicodeProperty

Relates to #66
@srl295
Copy link
Member

srl295 commented Mar 24, 2022

Observations (not complaints):

  • it's very clear that our development tools / environment / process has had quite an impact on how things are structured. Copying source files was easy, dependencies and reuse between projects hard/impossible, even between the unicodetools and the JSPs
  • I think CLDR should be split up into modules. Maybe it should depend on unicodetools and not the other way around!

@markusicu
Copy link
Member Author

I verified that the RandomStringGenerator is not used in CLDR. Please move it out of there and into the Unicode Tools.
https://unicode-org.atlassian.net/browse/CLDR-15482

@srl295
Copy link
Member

srl295 commented Mar 24, 2022

@markusicu my #214 does copy it into unicodetools paving the way for deletion on cldr.

@srl295
Copy link
Member

srl295 commented Mar 25, 2022

It's "now" illegal (as of JDK 9?) to have more than one module export the same package. So, CLDR and ICU aren't supposed to both export to com.ibm.icu.text.

srl295 added a commit that referenced this issue Apr 5, 2022
@markusicu
Copy link
Member Author

Something I learned (and others might have realized some time ago):

The Unicode Tools use the CLDR UnicodePropertySymbolTable which calls into UnicodeProperty. That must be a CLDR UnicodeProperty, which is what the symbol table code calls from within the CLDR jar. The Tools provide the symbol table with a ToolUnicodeProperty (which lives in the Tools) which extends UnicodeProperty. Bad things happen when CLDR code uses the Unicode Tools version of UnicodeProperty.

Unicode Tools code needs the Unicode Tools version of UnicodeProperty.

David changed CLDR UnicodeProperty to base its Matcher classes on a Java Predicate:
https://unicode-org.atlassian.net/browse/CLDR-13750
unicode-org/cldr#460

The Unicode Tools version is still basing the Matchers on an older ObjectMatcher interface.

@markusicu
Copy link
Member Author

FYI
In pull request #214 @srl295 copied most of the classes from https://github.com/unicode-org/cldr/tree/main/tools/cldr-code/src/main/java/org/unicode/cldr/util/props into the unicodetools.
It looks like only abstract class UnicodeLabel (the base for all of the UnicodeProperty classes) still exists only in CLDR.

@markusicu
Copy link
Member Author

meld ~/cldr/uni/src/tools/cldr-code/src/main/java/org/unicode/cldr/util/props/UnicodeProperty.java unicodetools/src/main/java/org/unicode/props/UnicodeProperty.java

@markusicu
Copy link
Member Author

@eggrobin just deleted the UnicodeJSPs version in PR #649.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants