Skip to content

Conversation

grhoten
Copy link
Member

@grhoten grhoten commented Apr 12, 2025

Resolves #37

I got approval to contribute these changes. The changes include:

  • Support word decompounding for inflecting words for Danish, Dutch, Norwegian and Swedish.
  • Performance improvements for the grammar synthesizer
  • Various bug fixes and improvements in the grammar synthesizer
  • Add Vietnamese support (mostly pronouns)
  • ICU 77 is now supported
  • French and Russian no longer need the code to be patched to attempt to use Wikidata
  • Fewer errors will be seen when switching Dutch to Wikidata
  • Use more C++20 by using starts_with and ends_with on string types

@grhoten grhoten requested a review from nciric April 12, 2025 08:02
nciric added a commit that referenced this pull request Apr 14, 2025
nciric added a commit that referenced this pull request Apr 14, 2025
* Adding wikidata as a source for Swedish. Marking dictionaries as lsf files.

* LFS attributes for sv

* Moving lfs data to local folder.

* Fixed dictionary-parser produces test correct data.

* Revert code change as it's going to land with #106. 4 tests will break. Lexicon change stays.
@grhoten grhoten merged commit ad8e163 into unicode-org:main Apr 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support word decompounding for inflecting words
2 participants