Skip to content

Conversation

grhoten
Copy link
Member

@grhoten grhoten commented Mar 10, 2025

Resolves #94

There are some changes that can be made to reduce the number of test failures across languages being transitioned to use Wikidata.

@grhoten grhoten requested a review from nciric March 10, 2025 07:17
#
# These are lexemes that should either be ignored due to irrelevance that can't be easily tagged as irrelevant,
# or words that are just not that common that should be sorted last in the inflection patterns.
L128740=omit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This noun is not typical, and it conflicts with the common pronoun. Remove it for now to deconflict it.

# These are lexemes that should either be ignored due to irrelevance that can't be easily tagged as irrelevant,
# or words that are just not that common that should be sorted last in the inflection patterns.
L128740=omit
L166820=omit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is стола, which is also the feminine form of стол (table). There is a conflict here. There are ways to deconflict this, but let's exclude this for now.

@@ -2,9 +2,10 @@
#
# These are lexemes that should either be ignored due to irrelevance that can't be easily tagged as irrelevant,
# or words that are just not that common that should be sorted last in the inflection patterns.
L15388=rare
L299075=omit
# TODO remove this, since it is fixed upstream.
L342586=omit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed after the next Wikidata dump is consumed.

@@ -120,6 +120,7 @@ public String toString() {

enum Tense {
PAST,
DISTANT_PAST,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hindi concept for one word.

@grhoten grhoten merged commit 6dd3d39 into unicode-org:main Mar 10, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Wikidata coverage in dictionary-parser
2 participants