low-resource-languages

Here are 121 public repositories matching this topic...

RichardLitt / low-resource-languages

Resources for conservation, development, and documentation of low resource (human) languages.

nlp list natural-language-processing awesome natural-language language-learning awesome-list language-resources endangered-languages human-language language-documentation resourced-languages minority-language low-resource-languages lrls

Updated May 9, 2024
TeX

csebuetnlp / xl-sum

Star

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

multilingual machine-learning deep-learning dataset text-summarization abstractive-text-summarization abstractive-summarization text-summarisation low-resource-languages multilinguality summarization-corpora summarization-dataset multilingual-text-summarization text-summarization-dataset text-summarization-model low-resource-summarization low-resource-text-summarizarion multilingual-summarization

Updated Mar 26, 2024
Python

csebuetnlp / banglanmt

Star

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.

machine-translation neural-machine-translation parallel-corpus parallel-corpora bangla-nlp low-resource-languages bangla-machine-translation bangla-dataset-machine-translation emnlp-2020 low-resource-nlp low-resource-machine-translation

Updated Oct 23, 2024
Python

Andrews2017 / africanlp-public-datasets

Star

A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.

natural-language-processing african-languages datasets low-resource-languages

Updated Apr 26, 2024

cisnlp / GlotLID

Star

Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

language-detection multlingual language-detector language-recognition glot lid language-identification language-classification language-identification-toolkit low-resource-languages language-detection-library language-identifier language-detection-lib langid low-resource-nlp glotcc glotlid

Updated Oct 30, 2024
Python

jcblaisecruz02 / Filipino-Text-Benchmarks

Star

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

benchmark deep-learning text-classification corpus transformer transfer-learning tagalog bert filipino electra nli low-resource-languages tagalog-transformers electra-models

Updated Aug 26, 2024
Python

Rumeysakeskin / Turkish-Text-to-Speech

Sponsor

Star

Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan

pytorch tts speech-synthesis nvidia-docker waveform-generator low-resource-languages nvidia-nemo hifigan fastpitch turkish-text-to-speech phonetical-conversion spectrogram-generator

Updated Dec 5, 2023
Python

ljvmiranda921 / calamanCy

Star

NLP pipelines for Tagalog using spaCy

nlp machine-learning natural-language-processing spacy computational-linguistics ner low-resource-languages low-resource-nlp

Updated Aug 12, 2024
Python

kbatsuren / CogNet

Star

CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates

wordnet corpus-linguistics language-resources cognate bilingual-lexicon-extraction low-resource-languages cross-lingual-simialrity multilinguality cross-lingual-transfer bilingual-lexicon-induction

Updated Jun 15, 2023

alexandra-chron / relm_unmt

Star

Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".

transfer-learning language-models cross-lingual low-resource-languages residual-adapters pretraining unsupervised-machine-translation

Updated Mar 16, 2022
Python

cdli-gh / Semi-Supervised-NMT-for-Sumerian-English

Star

Exploring the Limits of Low-Resource Neural Machine Translation

translation unsupervised transformers nmt semi-supervised backtranslation xlm low-resource-languages

Updated Feb 16, 2023
Jupyter Notebook

csikasote / BembaSpeech

Star

This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/TV shows transcripts, Youtube Video transcripts, Online sources. The corpus has 14, 438 utterances culminating into over 24 hours of speech.

automatic-speech-recognition low-resource-languages bemba

Updated May 23, 2024

hausanlp / NaijaSenti

Star

This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, Hausa, Yoruba and Pidgin.