-
-
Notifications
You must be signed in to change notification settings - Fork 16
Inflection-66 Fix --ignore-entries-with-grammemes in dictionary-parser and improve language variant handling #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…r and improve language variant handling
int qVariantIdx = currentLemmaLanguage.indexOf(VARIANT_SEPARATOR); | ||
if (qVariantIdx >= 0) { | ||
// The languages can have wierd Q entry after the desired language. | ||
// A spelling variant is informative. Most of the rest are irrelevant. | ||
var additionalCategory = currentLemmaLanguage.substring(qVariantIdx + VARIANT_SEPARATOR.length()); | ||
currentLemmaLanguage = currentLemmaLanguage.substring(0, qVariantIdx); | ||
var variant = Grammar.getMappedGrammemes(additionalCategory); | ||
if (variant == null) { | ||
if (parserOptions.debug) { | ||
System.err.println("Line " + lineNumber + ": " + additionalCategory + " is not a known grammeme for the language variant " + lexeme.id + "(" + lemma.value + ")"); | ||
} | ||
continue; | ||
} | ||
lemma.grammemes.addAll(variant); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chunk helps with ignoring some of the variants. This might need to be extended further for the inflections.
void setIgnoreProperty(String[] grammemes, Ignorable ignorable) { | ||
var ignorableSet = EnumSet.of(ignorable); | ||
for (String grammeme : grammemes) { | ||
if (grammeme.matches("Q\\d*")) { | ||
TYPEMAP.put(grammeme, ignorableSet); | ||
} | ||
else { | ||
for (Map.Entry<String, Set<? extends Enum<?>>> entry : TYPEMAP.entrySet()) { | ||
for (var grammemeEnum : entry.getValue()) { | ||
String name = grammemeEnum.name(); | ||
if (name.equalsIgnoreCase(grammeme)) { | ||
if (entry.getValue().size() == 1) { | ||
entry.setValue(ignorableSet); | ||
} | ||
else { | ||
entry.getValue().remove(grammemeEnum); | ||
ArrayList<Enum<?>> clone = new ArrayList<>(entry.getValue()); | ||
clone.add(ignorable); | ||
entry.setValue(new HashSet<>(clone)); | ||
} | ||
break; | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the ignorable properties and ignorable inflections, which tend to include a lot of irrelevant information.
} | ||
|
||
static final Map<SortedSet<String>, Set<? extends Enum<?>>> TYPEMAP = new HashMap<>(1021); | ||
static final Map<String, Set<? extends Enum<?>>> TYPEMAP = new HashMap<>(1021); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the key type, since Wikidata doesn't include a key set. This is a simplification.
Resolves #66
These changes reduce the failures for English, but it doesn't fully fix the issues yet. It's an improvement and a step in the right direction.